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We measure the projected correlation function w p (r p ) from the Sloan Digital Sky 
Survey for a flux-limited sample of 118,000 galaxies and for a volume-limited subset of 
22,000 galaxies with absolute magnitude M r < —21. Both correlation functions show 
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subtle but systematic departures from the best-fit power law, in particular a change 
in slope at r p ~ 1 — 2 /i -1 Mpc. These departures are stronger for the volume-limited 
sample, which is restricted to relatively luminous galaxies. We show that the inflection 
point in w p (r p ) can be naturally explained by contemporary models of galaxy clustering, 
according to which it marks the transition from a large scale regime dominated by 
galaxy pairs in separate dark matter halos to a small scale regime dominated by galaxy 
pairs in the same dark matter halo. For example, given the dark halo population 
predicted by an inflationary cold dark matter scenario, the projected correlation function 
of the volume-limited sample can be well reproduced by a model in which the mean 
number of M r < —21 galaxies in a halo of mass M > M\ = 4.74 x 10 13 /i _1 M Q is 
(N)m = (M/Mi) - 89 , with 75% of the galaxies residing in less massive, single-galaxy 
halos, and simple auxiliary assumptions about the spatial distribution of galaxies within 
halos and the fluctuations about the mean occupation. This physically motivated model 
has the same number of free parameters as a power law, and it fits the w p (r p ) data better, 
with a x 2 /d.o.f. = 0.93 compared to 6.12 (for 10 degrees of freedom, incorporating the 
covariance of the correlation function errors). Departures from a power-law correlation 
function encode information about the relation between galaxies and dark matter halos. 
Higher precision measurements of these departures for multiple classes of galaxies will 
constrain galaxy bias and provide new tests of the theory of galaxy formation. 

Subject headings: cosmology: observations — cosmology: theory — galaxies: distances 
and redshifts — galaxies: fundamental parameters — galaxies: statistics — large-scale 
structure of universe 

1. Introduction 

One of the longest standing quantitative results in the study of galaxy clustering is the power- 
law form of the two-point correlation function £(r) (Totsuji h Kihara 1969; Peebles 1974; Gott 
Sz Turner 1979). For many years this result rested mainly on the angular correlation function of 
imaging catalogs, measured with steadily increasing precision and dynamic range. More recently, 
analyses of the projected correlation function w p (r p ) in large galaxy redshift surveys have confirmed 
that the real space galaxy correlation function is close to a power law on small scales (e.g., Davis 
& Peebles 1983; Fisher et al. 1994; Marzke et al. 1995; Jing, Mo, & Borner 1998; Norberg et 
al. 2001; Jing, Borner, & Suto 2002; Zehavi et al. 2002, hereafter Z02). The angular correlation 
function (as well as the redshift-space correlation function) breaks below a power law at large scales 
(> 10 - 20 hr l Mpc; Groth & Peebles 1977; Maddox et al. 1990; Jing, Borner, & Suto 2002), and 
there are hints of a "shoulder" in £(r) at scales of several h^ 1 Mpc (Dekel & Aarseth 1984; Guzzo 
et al. 1991; Calzetti, Giavalisco, & Meiksin 1992; Baugh 1996; Gaztahaga & Juszkiewicz 2001; 
Gaztahaga 2002; Hawkins et al. 2003). There have also been some hints of departures from a power 
law at smaller scales (e.g., Connolly et al. 2002), but the significance of these has been difficult to 
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evaluate for two reasons: they are usually measured in the angular correlation function and are 
thus integrated over a wide range of galaxy luminosities and redshifts, and the statistical errors in 
correlation function estimates are themselves correlated in a complex way. 

It has become increasingly clear that leading cosmological models do not predict a power-law 
£(r) for the dark matter. For the ACDM model (inflationary cold dark matter with a cosmological 
constant), the matter correlation function rises above a best-fit power law on scales r < 1 h^ 1 Mpc 
and falls below it again on scales r < 0.2 h^ 1 Mpc (Jenkins et al. 1998, and references therein; h = 
.f/o/100 km s _1 Mpc -1 ). Semi-analytic and hydrodynamic simulation models of galaxy formation, 
and high resolution N-body simulations that identify galaxies with sub-halos inside larger virialized 
objects, predict a scale-dependent bias that makes the galaxy correlation function much closer 
to the observed power law, a significant success of these galaxy formation models in the context 
of ACDM (Colin et al. 1999; Kauffman et al. 1999; Pearce et al. 1999; Benson et al. 2000; Cen 
& Ostriker 2000; Somerville et al. 2001; Yoshikawa et al. 2001; Weinberg et al. 2004). However, 
while the general form of this bias can be understood qualitatively in terms of the physics of 
galaxy assembly (Kauffmann, Nusser, & Steinmetz 1997; Kauffman et al. 1999; Benson et al. 2000; 
Berlind et al. 2003), the emergence of a power law £(r) is largely fortuitous. In particular, there 
is a transition from a large scale regime in which pairs come from separate dark matter halos to a 
small scale regime in which pairs come from the same halo, and a power law correlation function 
requires coincidental alignment of these two terms. 1 Thus, the best contemporary models of galaxy 
clustering predict that sufficiently high precision measurements of the correlation function should, 
eventually, show departures from a power law. 

Here we present measurements of w p (r p ) from the main galaxy redshift sample of the Sloan 
Digital Sky Survey (SDSS; York et al. 2000). The correlation function of the flux-limited sample 
shows small but systematic deviations from a power law. When we measure w p (r p ) for a volume- 
limited sample of relatively luminous galaxies (M r < —21, L > 1.5L*), we find deviations of similar 
form and larger amplitude. In addition to establishing the existence of these deviations, a second 
goal of this paper is to introduce new techniques for modeling the projected correlation function 
in terms of the relation between galaxies and halos, extending the approach of Jing, Mo, & Borner 
(1998) and Jing, Borner, & Suto (2002) and building on theoretical work by Ma & Fry (2000), 
Peacock & Smith (2000), Seljak (2000), Scoccimarro et al. (2001), and Berlind & Weinberg (2002). 
We concentrate our modeling effort on the volume-limited sample, since it constitutes a well defined 
class of galaxies. We show that the departures of the measured w p {r p ) from a power law can be 
naturally explained by the predicted transition from a 2-halo regime on large scales to a 1-halo 
regime on small scales. We will examine the dependence of w p (r p ) on galaxy luminosity and color, 
and the implications of this dependence for galaxy-halo relations, in a separate paper (I. Zehavi et 



1 Throughout this paper we use the term "halo" to refer to a gravitationally bound structure with overdensity 
p/p ~ 200, so an occupied halo may host a single luminous galaxy, a group of galaxies, or a cluster. Higher overdensity 
concentrations around individual galaxies of a group or cluster constitute, in this terminology, halo substructure or 
"sub-halos." 
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al., in preparation). 

2. Observations and Analysis 

The SDSS uses a mosaic CCD camera (Gunn et al. 1998) to image the sky in five photometric 
bandpasses (Fukugita et al. 1996), denoted u, g, r, i, z? After astrometric calibration (Pier et 
al. 2003), photometric data reduction (R. H. Lupton et al., in preparation; see Lupton et al. 2001 
and Stoughton et al. 2002 for summaries), and photometric calibration (Hogg et al. 2001; Smith 
et al. 2002), galaxies are selected for spectroscopic observations using the algorithm described by 
Strauss et al. (2002). To a good approximation, the main galaxy sample consists of all galaxies 
with r-band apparent magnitude r < 17.77; the analysis in this paper does not include galaxies in 
the luminous red galaxy sample described by Eisenstein et al. (2001). Spectroscopic observations 
are performed with a pair of fiber-fed CCD spectrographs (A. Uomoto et al., in preparation), with 
targets assigned to spectroscopic plates by an adaptive tiling algorithm (Blanton et al. 2003a). An 
important operational constraint is that no two fibers on the same plate can be closer than 55" 
(a.k.a fiber collisions, affecting ~ 7% of the galaxies). Spectroscopic data reduction and redshift 
determination are performed by automated pipelines (D. J. Schlegel et al., in preparation; J. A. 
Frieman et al., in preparation), with rms galaxy redshift errors ~ 30km s _1 . 

The clustering measurements in this paper are based on a subset of the SDSS galaxy redshift 
data with well characterized completeness, known as Large Scale Structure samplelO, which is 
described in detail by Blanton et al. (2003c). LSS samplelO is based on data obtained prior to 
April 2002, and it contains 144,609 main sample galaxies. The radial selection function incorporates 
the luminosity evolution model of Blanton et al. (2003c) and the improved K-corrections of Blanton 
et al. (2003b, using kcorrect vl_ll). We K-correct the observed frame magnitudes in the SDSS 
bands to rest frame magnitudes for those bands blueshifted by z = 0.1, so that the K-correction 
is trivial for a galaxy at z = 0.1 (near the median redshift of the survey). The one photometric 
quantity of importance to this paper is the absolute magnitude in the redshifted r band, which we 
compute for h = 1 and denote Mo.i r (so that the true absolute magnitude is Mo.i r + 51og/i.) We 
will focus most of our attention on a volume-limited galaxy sample with Mo.i r < —21, a threshold 
that is 0.56 magnitudes brighter than the characteristic Schechter (1976) function luminosity M * lf , 
found by Blanton et al. (2003c). For all absolute-magnitude and distance calculations, we adopt a 
cosmological model with Q m = 0.3 and = 0.7. 

Our methods for measuring the galaxy correlation function are essentially the same as those 
of Z02, to which we refer the reader for a detailed description and tests. In brief, we create random 
catalogs using the survey angular selection function and the radial selection function appropriate 



2 Fukugita ct al. (1996) actually define a slightly different system, denoted u' , g', r' , i' , z , but SDSS magnitudes are 
now referred to the native filter system of the 2.5-m survey telescope, for which the bandpass notation is unprimed. 
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to the galaxy sample under consideration. We calculate £(r p ,7r), the correlation function as a 
function of separation perpendicular (r p ) and parallel (ir) to the line-of-sight, by counting data- 
data, data-random, and random-random pairs and using the Landy & Szalay (1993) estimator. We 
then compute the projected correlation function w p (r p ), 



We adopt 7r max = 4000 km s = 40 h Mpc, which is large enough to include essentially all 
significant signal at the values of r p of interest here (0.1 h^ 1 Mpc < r p < 20/i _1 Mpc) while 
suppressing noise from uncorrelated structure at very large line-of-sight separations. We account 
for spectroscopic fiber collisions by assigning to each "collided" (and thus unobserved) galaxy the 
same redshift as the observed galaxy responsible for the collision. The main advances relative to 
Z02 are the much larger data sample, the improved model of the radial selection function, and the 
improved error estimates discussed below. 

Figure la shows the projected correlation function w p (r p ) of a flux-limited subset of samplelO. 
We restrict this subset to galaxies in the absolute-magnitude range —19 > Mo.i r > —22 and 
the redshift range 0.02 < z < 0.16, so that we avoid galaxies at the extremes of the luminosity 
distribution and minimize the effect of redshift evolution within the sample. Note, however, that 
the sample is not volume- limited, so that not all galaxies within the absolute- magnitude limits can 
be seen over the full redshift range. With our adopted redshift, absolute-magnitude, and angular 
limits, the flux-limited catalog contains 118,149 galaxies. In Figure la, the statistical error bars on 
the data points are estimated via the jackknife resampling procedure used by Z02. We define 104 
geometrically contiguous subsamples of the full data set, each covering approximately 20 square 
degrees on the sky, then estimate error bars from the total dispersion among the 104 jackknife 
samples that are created by omitting each of these subsamples in turn (Z02, eq. 7). 

The integration over line-of-sight separations makes w p (r p ) independent of redshift-space dis- 
tortions. In this respect, it resembles the angular correlation function w(9), but because w p (r p ) 
makes use of the known redshifts of each pair of galaxies, it is a much more sensitive measure 
(for a given number of galaxies) of the real space correlation function £(r) (Davis & Peebles 1983; 
Hamilton & Tegmark 2002). The general relation between w p (r p ) and £(r) is 



from which one can see that a power-law £(r) projects into a power-law w p (r p ). The solid line in 
Figure la shows a power-law fit to the w p {r p ) data points, corresponding to a real space correlation 
function f(r) = (r/5.77/1^ 1 Mpc)" 1 - 80 . Statistical errors in the correlation function are strongly 
correlated because each coherent structure contributes pairs at many different separations, and the 
solid-line fit utilizes the full covariance matrix estimated by jackknife resampling. If we ignore the 
error correlations and use only the diagonal elements of the covariance matrix, we obtain the slightly 
shallower power law shown by the dotted line, which corresponds to £(r) = (r/5.91 h^ 1 Mpc) -1 " 78 . 




(1) 




(2) 
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Fig. 1. — Projected correlation function w p (r p ) for the flux-limited redshift sample (left) and the 
volume-limited subset of galaxies with Mo.i r < —21 (right). For the flux-limited sample, error bars 
and their covariance matrix are estimated by jackknife resampling of the data set, while for the 
volume-limited sample they are estimated from mock catalogs as described in the text. In each 
panel, solid lines show maximum-likelihood power-law fits that incorporate the full error covariance 
matrix, and dotted lines show least-squares fits that ignore the error correlations. 
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While the data points in Figure la lie close to the best-fit power laws, they do not scatter 
randomly above and below. Instead, they criss-cross the fits, and they have a steeper logarithmic 
slope at 0.5 h~ l Mpc < r < 1.5 /i -1 Mpc and a shallower slope at 1.5 h -1 Mpc < r < 4/i -1 Mpc. 
The x 2 of the fit, estimated using the jackknife covariance matrix, is 31.8 for ten degrees of freedom 
(12 data points minus 2 parameters), which suggests that these departures are statistically signifi- 
cant. (Even though the data points are correlated, it is correct to count one degree of freedom per 
data point because we use the full covariance matrix in evaluation of \ 2 ■) However, the physical 
implications of these departures are difficult to assess because galaxy clustering is known to de- 
pend on luminosity (Norberg et al. 2001; Z02, and references therein), and the flux-limited sample 
contains a different mix of galaxies at different redshifts and does not represent the clustering of 
any well defined class. Furthermore, while the tests in Z02 (and tests that we have conducted 
subsequently) show that the jackknife method yields reasonable estimates of the error covariance 
matrix on average, statistical noise in these estimates can lead to an inaccurate inverse matrix and 
consequently inaccurate \ 2 estimates. 

To address both of these problems, we measure the projected correlation function of a volume- 
limited subset of galaxies with Mo.i r < —21 and the same redshift range 0.02 < z < 0.16. All 
21,659 galaxies in this subset are luminous enough to be seen over the full redshift range. To 
obtain low noise estimates of the covariance matrix, we now create 100 mock catalogs with the 
same geometry, completeness as a function of sky position, and galaxy number density as this 
volume-limited sample, using the PTHalos method of Scoccimarro & Sheth (2002). The input 
parameters for these catalogs are chosen based on the model described in §4, with the consequence 
that the average w p (r p ) of these mock catalogs is close to the observed value. Thus, this covariance 
matrix should be appropriate for fitting models to these data and for assessing the statistical 
acceptability of fits. To account for the small residual mismatch between the mock catalog and 
observed w p (r p ), we rescale covariance matrix elements CV,- by the ratio of the observed and mock 
Wp{r p ^)wp{r p j), in effect assuming that the mock catalogs most accurately predict the fractional 
rather than absolute errors in w p (r p ). However, our conclusions would be no different if we did not 
apply this scaling. 

Figure lb shows w p {r p ) of the Mo.i r < —21 sample, with error bars on the data points represent- 
ing the square root of the diagonal elements of the covariance matrix estimated from the mock cata- 
logs. The dotted line shows a power-law fit that incorporates only these diagonal elements, while the 
solid line shows a maximum-likelihood fit that uses the full covariance matrix. The corresponding 
real space correlation functions are £(r) = (r/6.40/i -1 Mpc)~ L89 and £(r) = (r/5.91 Mpc) -1 - 93 , 
respectively. Since the error correlations for the large scale data points are particularly strong, the 
full maximum-likelihood fit puts more effective weight on the data points at smaller scales, yielding 
a steeper power law. 

Relative to the power-law fits, the data points in Figure lb show the same systematic departures 
seen for the flux-limited sample but in exaggerated form, especially the marked change in slope at 
r p 2 h~ l Mpc. We find deviations of similar form for most other volume-limited SDSS samples (I. 
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Zehavi et al., in preparation), but the deviations are stronger for the relatively luminous galaxies 
selected by the Mo.i r < —21 threshold. Deviations of similar form are seen in the projected 
correlation function of the 2dF Galaxy Redshift Survey (2dFGRS), as shown by Hawkins et al. 
(2003, see their Figure 9), who comment that the deviation "probably is a real feature," though 
they do not assess the significance quantitatively or discuss the physical interpretation in detail. 
The existence of these deviations in flux- and volume-limited subsets of the r-band limited SDSS 
and in the independent, 6j-band limited 2dFGRS demonstrates their observational robustness, 
though their magnitude does depend on galaxy luminosity and color. 

The x 2 f° r the solid-line fit in Figure lb, based on the full covariance matrix and thus account- 
ing for the correlation of errors, is 61.2 for 10 degrees of freedom, or x 2 /d.o.f. = 6.12. (A similar 
fit with x 2 /d.o.f . = 4.37 is obtained with the jackknife covariance matrix.) We now show that a 
physically motivated model with the same number of free parameters provides a significantly better 
fit to the data. 

3. Modeling the Correlation Function 

To model w p (r p ) in a way that accounts for non-linear cosmological evolution and the poten- 
tially complex relation between galaxies and mass, we adopt the general framework of the "halo 
occupation distribution" (HOD) and use a modified form of the calculational methods introduced 
by Ma & Fry (2000), Peacock & Smith (2000), Seljak (2000), Scoccimarro et al. (2001), and Berlind 
& Weinberg (2002). The HOD framework characterizes galaxy "bias" in terms of the probability 
P(N\M) that a halo of virial mass M contains N galaxies of a specified class, together with addi- 
tional prescriptions that specify the relative distributions of galaxies and dark matter within halos. 
In addition to allowing us to understand the power law deviations found above, HOD modeling 
transforms w p (r p ) measurements into the language of contemporary cosmological models and galaxy 
formation theories, which respectively predict the properties of the dark halo population (e.g., Press 
& Schechter 1974; Mo & White 1996; Sheth, Mo, & Tormen 2001; Jenkins et al. 2001) and the 
occupation statistics of galaxies (e.g., Kauffmann et al. 1997; Kauffman et al. 1999; Benson et al. 
2000; Somerville et al. 2001; White, Hernquist, & Springel 2001; Yoshikawa et al. 2001; Berlind et 
al. 2003; Kravtsov et al. 2004). While power law fits to correlation functions are more familiar, 
we consider the HOD fitting approach adopted here to be more physically natural, in addition to 
providing a better description of the data. Magliocchetti & Porciani (2003) have recently applied 
a similar approach to interpretation of clustering data from the 2dFGRS. 

We start with the halo population predicted by a ACDM model, with parameters £l m = 0.3, 
n A = 0.7, h = 0.7, n = 1, cr 8 = 0.9, using the Efstathiou, Bond, & White (1992) form of the 
CDM transfer function with parameter T = 0.21. These choices provide a reasonable match to a 
wide variety of cosmological observations, including the shapes of the 2dFGRS and SDSS galaxy 
power spectra at large scales where the bias is expected to be scale-independent (Percival et al. 
2001; Tegmark et al. 2004). We compute the galaxy correlation function £(r) as a sum of two 
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terms, one representing pairs of galaxies that reside within the same dark matter halo, the other 
representing pairs in separate halos. We obtain the projected correlation function w p (r p ) from £(r) 
via equation (2) (with the same upper limit of 40 /i -1 Mpc as in the measurements). 

The 1-halo term is obtained by integrating over the Jenkins et al. (2001) halo mass function, 
weighting each halo of mass M by the mean number of galaxy pairs (N(N — 1))m- We assume that 
each dark halo has an NFW profile (Navarro, Frenk, & White 1996) with c(M) = ll(M/M*)-°- 13 
(Bullock et al. 2001), where c is the NFW halo concentration parameter and M* = 1.07x 1O 13 /i _1 M 
is the non-linear mass scale for our adopted cosmological parameters. 3 Motivated by hydrodynamic 
simulation results (White, Hernquist, & Springel 2001; Berlind et al. 2003), we assume that the 
first galaxy in each occupied halo resides at the halo center-of-mass and that additional "satellite" 
galaxies trace the dark matter distribution; similar assumptions are standard practice in the HOD 
papers cited above and in galaxy clustering predictions based on N-body simulations with halos 
populated according to semi-analytic models (Kauffmann et al. 1997; Kauffman et al. 1999; Benson 
et al. 2000; Somerville et al. 2001). We calculate the distribution of pair separations within each 
halo — the function F'{x) in equation (11) of Berlind & Weinberg (2002) — by Monte Carlo 
sampling of NFW halo realizations, assuming that the halos are spherical. 

The 2-halo term is essentially the matter correlation function multiplied by the appropriately 
weighted halo bias factor (Sheth, Mo, & Tormen 2001; an improvement on earlier results by Mo & 
White 1996), with convolution to represent finite size of halos, and is calculated in Fourier space. 
Relative to Seljak (2000) and Scoccimarro et al. (2001), there are three significant changes in our 
calculation of the two-halo term. First, instead of the linear theory matter correlation function we 
use the non-linear correlation function, and make use of the non-linear power spectrum given by 
Smith et al. (2003). Second, we approximately incorporate the effects of halo exclusion by including 
in the 2-halo term at separation r only those halos whose virial radii are i? v ; r < r/2 (similar to 
Takada & Jain 2003). Third, we incorporate scale dependence of the halo bias factor on non-linear 
scales, using an empirical formula b^M, r) = [1 + 0.2£ m (r)] _a5 &| \- m (M) obtained by matching the 
halo correlation functions of the GIF ACDM simulation (Jenkins et al. 1998). Here £ m (r) is the 
non-linear matter correlation function, and bh,]m(M) is the large scale bias factor given by Sheth, 
Mo, &: Tormen (2001) for halos of mass M. The ratio of the non-linear £ m (r) to the linear theory 
£(r) is ~ 0.75 — 0.8 on scales of several h^ 1 Mpc, so it is essential to use the former when modeling 
data with the precision of the SDSS measurements. Once the non-linear £, m (r) is used, it is essential 
to account for halo exclusion and scale-dependent bias to obtain acceptable results on small scales. 
We present a test of the accuracy of our analytic approximation in the Appendix, demonstrating 
that it is adequate to our purposes in this paper. 

For the halo occupation distribution itself, we adopt a simple model loosely motivated by 



3 We use c(M«) = 11 rather than 9 to account for our definition of halos as enclosing a sphere of mean overdensity 
200, instead of the value 340 used by Bullock et al. (2001). We choose 200, in turn, because this definition more 
nearly corresponds to the one used in estimating the halo mass function. 
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results from smoothed particle hydrodynamic (SPH) simulations and semi-analytic calculations, 
e.g., the models of Kauffmann et al. (1997) and Benson et al. (2000), the fits of the Kauffman 
et al. (1999) models by Seljak (2000) and Scoccimarro et al. (2001), the SPH results of White, 
Hernquist, & Springel (2001) and Yoshikawa et al. (2001), and the detailed comparison between 
SPH and semi-analytic predictions by Berlind et al. (2003). We assume that the mean occupation 
in halos of mass M > Mi is a power law, (N)m = (M/Mi) a , and that halos in the mass range 
M m i n < M < Mi contain a single galaxy above the luminosity threshold. The theoretical models 
cited above predict that the width of the distribution P(N\(N)) at fixed halo mass is substantially 
narrower than a Poisson distribution when the mean occupation is low, making the mean number of 
pairs (N(N — 1))m lower than the Poisson expectation (N) 2 . This suppression of pairs in low mass 
halos has an important influence on the predicted correlation function. For our baseline model, 
we assume that the actual occupation for a halo of mass M is one of the two integers bracketing 
(N)m, though we will discuss some alternative cases in §4 below. As noted earlier, we assume that 
the first galaxy in any halo resides at the center of mass and that any remaining galaxies trace 
the dark matter within the halo. For given values of a and Mi, we choose the value of M m [ n to 
match the observed number density of Mo.i r < —21 galaxies, n = 9.9 x 10~ 4 /i 3 Mpc~ 3 . Thus, there 
are two parameters (a and Mi) that can be varied to fit the correlation function. Of course there 
are many other parameters required to describe the cosmological model, the concentration-mass 
relation, and so forth, but all of these were chosen based on independent observational or theoretical 
considerations; we made our default choices before starting to model w p (r p ) and did not adjust any 
of them in order to fit the data. 

Our assumptions about the form of the HOD are restrictive and are unlikely to be exactly 
correct. However, they are reasonably motivated by current theoretical models, and they yield a 2- 
parameter description that can be fairly well constrained by w p (r p ) measurements. The assumption 
that (N)m is flat between M m - m and Mi is clearly artificial, but because halos with M < Mi do 
not contribute to the 1-halo term of £(r), our results are insensitive to the form of (N)m m this 
"single occupancy" regime; we have confirmed this expectation by considering alternative forms for 
{N)m in the range where (N) < 1. The important quantity is the overall fraction of galaxies in 
halos with M < Mi, since this directly affects the normalization of the 1-halo term. Our modeling 
approach is similar in spirit to the "conditional luminosity function" studies of Yang, Mo, & van 
den Bosch (2003) and van den Bosch, Mo, & Yang (2003), but here we focus on luminosities 
Mo.i r < —21 instead of simultaneously modeling the luminosity function and luminosity-dependent 
clustering, and we use the full correlation function w p (r p ) as a constraint instead of the correlation 
length ro alone. When other clustering measurements such as higher order correlations and the 
group multiplicity function are included, it is possible to constrain HOD models with much more 
freedom, and to simultaneously constrain the cosmological model (Berlind & Weinberg 2002; Z. 
Zheng & D. H. Weinberg, in preparation). We leave this effort to future work, when a wider range 
of complementary measurements are available. 

Figure 2 illustrates the behavior of the real space galaxy correlation function £(r) for varying 
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choices of the model parameters Mi and a. For each combination, the value of M m j n is chosen 
by matching the observed number density of Mo.i r < —21 galaxies. Figure 2a shows the effect 
of varying a and Figure 2b the effect of varying M\. The central model in each panel has the 
parameters that yield the best fit to the w p (r p ) data points of Figure 1, as we discuss in §4. 

For the central model, Figure 2 plots the 1-halo and 2-halo contributions to £(r) in addition 
to the total. For other models, only the 1-halo terms and the total are shown; the 2-halo terms are 
similar but not identical to that of the central model. The 1-halo terms have a nearly power-law 
form at small scales, but they cut off fairly steeply at separations approaching the virial diameter of 
large halos, a consequence of the rapidly falling halo mass function. On large scales, the 2-halo term 
traces the shape of the matter correlation function, then it flattens and cuts off at r ~ 1 — 2 hr x Mpc 
as a consequence of halo exclusion and the scale-dependent halo bias described above. For higher a 
or lower Mi, a larger fraction of galaxies reside in massive, high-occupancy halos with large virial 
radii, so the 1-halo term has higher amplitude and extends to larger r. Regardless of the specific 
parameter values, the transition from the 2-halo regime to the 1-halo regime represents a transition 
from a function that is flattening and cutting off to a function that is rising steeply. Thus, these 
models generically predict a change in the slope of the correlation function at scales comparable to 
the virial diameters of large halos. The strength and location of this break depend on the relative 
fractions of galaxies in high and low mass halos. 



4. Fitting the Observations 

Figure 3 compares the observed w p (r p ) of the Mo.i r < —21 sample to the model prediction 
for parameter values Mi = 4.74 x 10 13 /i _1 M Q and a = 0.89. These values are determined by a 
maximum-likelihood fit to the data points incorporating the covariance matrix derived from the 
mock catalogs. Matching the observed number density of the sample requires M m [ n = 6.10 x 
10 12 /i~ 1 Mq, and the fraction of galaxies in halos with M < Mi is 75%. The x 2 value of the fit 
is 9.3 for 10 degrees of freedom (12 data points minus the 2 parameters that are varied to fit the 
correlation function), or x 2 /d.o.f. = 0.93. Thus, the HOD model yields a statistically acceptable 
fit to the data, and with the same number of free parameters as the power law, it fits the data 
significantly better (Ax 2 = 51.9). The lower panel of Figure 3 shows the ratio of the data points 
and the HOD model to the best-fit power law, from which one can see that the model predicts just 
the sort of dip at ~ 1 — 2 h~ l Mpc and bulge at several h^ 1 Mpc that is observed in the data. 

The error bars on the model parameters (defined by A% 2 = 1) are ±0.05 in a and ±(0.5 x 
lO 13 /i _1 M0) in Mi. These errors are strongly correlated, but the mean occupation at M = 
10 14.5 /i -i Mq ig con st r ained to log 10 (AT 14 . 5 ) = 0.733 ± 0.007, with an error that is nearly uncorre- 
cted with a. If we use the jackknife covariance matrix estimated from the data instead of the mock 
catalog covariance matrix, we obtain a very similar fit with nearly the same x 2 ■ If we use the mock 
catalog covariances without the scaling described in §2, we obtain a very similar fit with a lower 
X 2 ■ A mean multiplicity of 5.4 at lQ 1A ^h~ l M & might look low at first glance, but our luminosity 
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Fig. 2. — Real space galaxy correlation functions for HOD models with Mi = 4.74 x 10 h Mq 
and varying values of a (left), and for a = 0.89 and varying values of Mi (right). For each model 
we plot the total £(r) (upper curves) and the 1-halo contribution (lower curves). The dotted curve 
shows the 2-halo contribution for the central model; this contribution is similar but not identical 
in the other models. In all models, the parameter M m \ n is adjusted to keep the space density fixed 
at n = 9.9 x 10~ 4 /i 3 Mpc" 3 . 
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Fig. 3. — Projected correlation function for the Mo.i r < —21 sample together with the predicted 
correlation function for the best-fit HOD model, with parameters a = 0.89, M\ = 4.74x 1O 13 /i -1 M0, 
and M min = 6.10 x lO 12 ft _1 M . The reduced x 2 for this 2-parameter fit is xV d -°- f - = °- 93 i while 
the reduced \ 2 f° r the power-law fit shown by the solid line in Figure 1 is x 2 /d.o.f. = 6.12. The 
lower panel shows the data and model prediction divided by this best-fit power law. In the upper 
panel, dotted curves show the 1-halo and 2-halo contributions to w p (r p ) and the dashed curve shows 
the projected correlation function for the matter computed from the nonlinear power spectrum of 
Smith et al. (2003). 
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threshold is fairly high (~ 1.5L*), and this multiplicity is reasonably consistent with the number of 
comparably luminous galaxies in Virgo (Trentham & Tully 2002) and with the measured richness or 
luminosity of SDSS clusters at a similar cumulative space density of n(> M) = 6.4 x 10~ 6 /i 3 Mpc~ 3 
(Bahcall et al. 2003). 

The HOD model that we have fit to the data is not unique, since we could have adopted a 
different form for (N)m, or for the width of the distribution at fixed M, or for the internal distri- 
bution of galaxies within halos. For example, if we change the normalization of the c(M) relation 
from c(M*) = 11 to 20 or 5, or the index from —0.13 to or —0.25, then we still get acceptable 
(though slightly worse) fits to the w p {r p ) data, but with changes ~ 0.1 in a and associated changes 
in Mi and M m - m . Increasing halo concentrations shifts 1-halo pairs towards smaller separations, 
and this change can be compensated by putting more galaxies into halos with large virial radii. 
We have also considered a model for P(N\(N)) that closely tracks the predictions of semi-analytic 
models and SPH simulations (Kauffman et al. 1999; Benson et al. 2000; Seljak 2000; Scoccimarro 
et al. 2001; Berlind et al. 2003), in which the width climbs steadily from nearest-integer at (N) ~ 1 
to Poisson at high N, with the transition halfway complete at (N) ~ 4. We again find that we 
can fit the data nearly as well as with our baseline model, with only slight changes to the {N)m 
parameters. We are also able to fit w p (r p ) well using Kravtsov et al.'s (2004) proposed parameteri- 
zation of a step-function (N)m for central galaxies and a power-law {N)m for satellites, instead of 
the plateau/power-law form for the full population that we adopt here. 

The most important lesson to be learned from these alternative fits is that all of them produce 
a very similar w p (r p ), with an inflection at r p ~ 1 — 2h~ 1 Mpc that always marks the transition 
from the 1-halo regime of the correlation function to the 2-halo regime. Thus, this interpretation of 
the observed feature in w p (r p ) is not sensitive to the details of our HOD model or our calculational 
method. Our account parallels Seljak's (2000) proposed explanation of the inflection in the observed 
galaxy power spectrum (Peacock 1997). We have not examined alternative cosmological parameter 
choices because our analytic approximation is calibrated against a specific N-body simulation, 
but we anticipate that modest changes in the normalization as would still allow successful fits to 
w p (r p ), with compensating changes in (N)m- Substantial changes to the shape of the matter power 
spectrum, on the other hand, might be impossible to accommodate. 

We have also investigated a model in which the distribution P(N\(N)) is Poisson instead of 
nearest-integer, and in this case we can find no combination of Mi and a that comes close to fitting 
the w p (r p ) data. Thus, we confirm earlier arguments that the sub-Poisson fluctuations predicted 
by the leading galaxy formation models are essential to reproducing observed galaxy clustering. 

Gaztahaga &: Juszkiewicz (2001) have also discussed deviations from a power-law correlation 
function, based on the real space £(r) that Baugh (1996; see also Padilla &: Baugh 2003) obtained 
by inverting the angular clustering measurements from the Automatic Plate Measuring (APM) 
galaxy catalog (Maddox et al. 1990). The inflection point in the inverted APM £(r) occurs at 
r s=s 5/i _1 Mpc, which is larger than the scale of r p « 2/i~ 1 Mpc where we find an inflection 
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in w p (r p ). We have not attempted to invert w p (r p ) to derive £(r) directly, but the real space 
correlation function of the best fitting HOD model changes slope most rapidly between 2 and 
4 ft, -1 Mpc (Figure 2). Our methods are different, and a quantitative assessment of the discrepancy 
is difficult, so while the scale of the feature we find appears to be somewhat smaller, it is not clear 
that the APM and SDSS results are incompatible. 

Gaztahaga k, Juszkiewicz (2001) argue that the inflection of £(r) is connected to the onset 
of non-linear gravitational evolution, drawing on the pair conservation equation (Davis k, Peebles 
1977), and they conclude that the coincidence of this inflection scale with the galaxy correlation 
length implies that APM galaxies trace the underlying mass distribution to a good approximation. 
We associate the feature in w p (r p ) with the transition from the 2-halo regime of the correlation 
function to the 1-halo regime, at a smaller, more highly nonlinear scale set by the virial diame- 
ters of rare, massive halos. Gaztahaga & Juszkiewicz (2001) model the APM data using N-body 
simulations by Baugh & Gaztahaga (1996) that have an initial power spectrum custom designed 
to evolve into the observed APM power spectrum (using the methods of Peacock & Dodds [1994] 
and Jain, Mo, & White [1995]). We have assumed instead that the underlying matter correlation 
function, shown by the dashed line in Figure 3, is that of a ACDM cosmological model with pa- 
rameters favored by other observations. The correlation function of the Mo.i r < —21 galaxies is 
biased by a factor b 2 ~ 2 on large scales, and the bias is strongly scale-dependent in the non-linear 
regime. Figure 4 plots the ratio [£gg(r)/£mm('")] 1 ^ 2 for our best-fit model, which is similar in shape 
to the "bias function" that Jenkins et al. (1998) concluded would be required to reconcile CDM 
predictions with observations. While the scale dependence is itself complex, it emerges from a 
simple HOD model with two free parameters that is motivated by the predictions of contemporary 
galaxy formation theory. A strict mass-traces-light model, on the other hand, must choose a full 
1-dimensional function, the initial power spectrum, specifically to match the observed correlation 
function, and this function has no motivation from theory or other observations. Tests of our 
model will soon be provided by additional clustering measurements such as the group multiplicity 
function, higher order correlation functions, and dynamical group masses. 

We have concentrated in this paper on the clustering of relatively luminous galaxies, and 
these exhibit stronger departures from a power-law correlation function than lower luminosity 
populations. In fact, hydrodynamic simulations and semi-analytic models predict just this behavior: 
departures from a power law are stronger for luminous, rare, strongly clustered galaxies than for 
lower luminosity populations of higher space density and lower clustering amplitude (Weinberg et 
al. 2004; Berlind et al. 2003). However, as noted above, we find similar signatures of the 1-halo 
to 2-halo transition in most of the other SDSS volume-limited samples we have analyzed, albeit 
at lower significance. We consistently find that HOD models of the sort developed here can fit 
the measured correlation functions as well as or better than power laws. We will present these 
results and their implications for the luminosity dependence of galaxy halo occupations elsewhere 
(I. Zehavi et al., in preparation). As noted in §2, Hawkins et al. (2003) find small deviations 
from a power-law correlation function, similar to those found here, in their analysis of the full, 
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Fig. 4. — "Bias function" defined by b(r) = [C gg (?*)/Cmm(r)] 1 ^ 2 for the best-fit HOD model, where 
^mm(^) is the non- linear matter correlation function for our adopted cosmology. 
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flux-limited 2dFGRS. The existence of similar features in independent analyses of the two largest 
galaxy redshift surveys demonstrates their robustness, and the modeling presented here shows that 
they are physically natural. 

The parameters of power-law fits to the galaxy correlation function have long been an important 
constraint on cosmological parameters and galaxy formation models. We anticipate, however, 
that w p (r p ) measurements of increasing precision will reveal departures from a power-law that are 
increasingly significant, for a variety of galaxy classes. These departures encode information about 
the number of galaxies as a function of halo mass, about the distribution of halo virial radii, and 
about the relative distributions of galaxies and dark matter within halos. We therefore expect that 
future measurements of the galaxy correlation function will yield ever richer information about 
cosmology and galaxy formation. 
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A. Accuracy of the Analytic Approximation 

We have used the GIF ACDM N-body simulation of Jenkins et al. (1998) to guide the develop- 
ment of our analytic model for the correlation function and to test its accuracy. Figure 5 presents 
an example of such a test, for an HOD model similar to the best-fit baseline model described in §4. 
We identify halos in the GIF simulation using a friends-of-friends algorithm (Davis et al. 1985) with 
linking parameter of 0.2, which selects systems of overdensity p/p ~ 200. We choose the number 
of galaxies in each halo based on the model P(N\M), place the first galaxy at the halo center, and 
choose random dark matter particles within the halo for other galaxies. Points show £(r) for this 
galaxy population, with la error bars estimated by jackknife resampling of the eight octants of 
the 141.3 h~ l Mpc simulation cube. The solid curve shows the analytic model prediction for the 
same HOD. It lies systematically above the numerical results at large r because of the truncation 
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Fig. 5. — Test of the analytic correlation function model against the GIF N-body simulation of 
Jenkins et al. (1998). Friends-of-friends halos in the GIF simulation are populated using an HOD 
model like the one that best fits the SDSS Mo.i r < —21 data. Points show the numerical results 
with jackknife error bars, and the dotted line shows the 1-halo contribution alone. The solid line 
shows the full analytic model prediction, with the dashed line indicating the 1-halo term. The dot- 
dashed line shows the effect of truncating P(k) at the size of the simulation cube when computing 
the analytic prediction. 
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of large scale power on the scale of the simulation box. When this truncation is incorporated into 
the analytic calculation (dot-dashed curve), the falloff of £(r) at large r is well reproduced. Dot- 
ted and dashed curves show the 1-halo contributions from the simulation and the analytic model, 
respectively. The analytic 1-halo term extends slightly further than the numerical one, probably 
because of the absence of very high mass halos in the finite simulation volume. 

From this comparison, we conclude that the analytic model is accurate to the degree that we 
are able to test it with this simulation. This test implies that our treatment of scale-dependent 
halo bias and halo exclusion (see §3) is adequate for our present purposes. Residual inaccuracies of 
~ 10—20% could still be present at some separations. When it comes to fitting the data, inaccuracies 
at this level could have a noticeable effect on our determinations of best-fit HOD parameters, but 
their effect is comparable to that of changes in the halo c{M) relation discussed in §4, and they are 
unlikely to change our conclusions about the physical significance of the departures from a power- 
law w p (r p ). We have chosen to base our fits on a numerically calibrated analytic model rather than 
the populated GIF simulation itself for several reasons: the analytic approach provides us with a 
well defined model that is not tied to the numerical details of a particular simulation, it is more 
practical for maximum- likelihood parameter determinations, and it is not affected by truncation of 
large scale power. Because w p {r p ) is defined by integrating £(r p ,7r) out to large separations, the 
effect of this missing power is greater than that in Figure 5, depressing w p (r p ) by factors of 1.5 — 2 at 
r p = 10 — 20 hr 1 Mpc. The analytic model can in principle be applied to other cosmological models, 
but we have not yet tested our form of the scale-dependent halo bias factor on other simulations, so 
we do not know if it remains accurate for other cosmological parameters. Sheth & Lemson (1999) 
and Casas-Miranda et al. (2002) discuss general expectations for the scale-dependence of halo bias. 
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